Tper Hcaeser Pidi Implementation of the Standard I-vector System for the Kaldi Speech Recognition Toolkit
نویسندگان
چکیده
This report describes implementation of the standard i-vector-PLDA framework for the Kaldi speech recognition toolkit. The current existing speaker recognition system implementation is based on the Subspace Gaussian Mixture Model (SGMM) technique although it shares many similarities with the standard implementation. In our implementation, we modified the code so that it mimics the standard algorithms in the ivector based speaker recognition system. The implementation is compared with the existing Kaldi recipe for speaker recognition on the NIST SRE 2008 evaluation set. The entire implementation is made available at https://github.com/idiap/kaldi-ivector under the Apache 2.0 license.
منابع مشابه
Tper Hcaeser Pidi Implementation of Vtln for Statistical Speech Synthesis
Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The E...
متن کاملTper Hcaeser Pidi Application of Out-of-language Detection to Spoken-term Detection
This paper investigates the detection of English spoken terms in a conversational multi-language scenario. The speech is processed using a large vocabulary continuous speech recognition system. The recognition output is represented in the form of word recognition lattices which are then used to search required terms. Due to the potential multi-lingual speech segments at the input, the spoken te...
متن کاملTper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios
This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conve...
متن کاملپایهگذاری بستری نو و کارآمد در حوزه بازشناسی گفتار فارسی
Although researches in the field of Persian speech recognition claim a thirty-year-old history in Iran which has achieved considerable progresses, due to the lack of well-defined experimental framework, outcomes from many of these researches are not comparable to each other and their accurate assessment won’t be possible. The experimental framework includes ASR toolkit and speech database ...
متن کاملImplementation of a Radiology Speech Recognition System for Estonian Using Open Source Software
Speech recognition has become increasingly popular in radiology reporting in the last decade. However, developing a speech recognition system for a new language in a highly specific domain requires a lot of resources, expert knowledge and skills. Therefore, commercial vendors do not offer ready-made radiology speech recognition systems for less-resourced languages. This paper describes the impl...
متن کامل